Search CORE

5 research outputs found

A Sweet Recipe for Consolidated Vulnerabilities: Attacking a Live Website by Harnessing a Killer Combination of Vulnerabilities

Author: Ansary MD. Nazmuddoha
Islam A. B. M. Alim Al
Islam Mazharul
Nurain Novia
Shams Salauddin Parvez
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/06/2019
Field of study

The recent emergence of new vulnerabilities is an epoch-making problem in the complex world of website security. Most of the websites are failing to keep updating to tackle their websites from these new vulnerabilities leaving without realizing the weakness of the websites. As a result, when cyber-criminals scour such vulnerable old version websites, the scanner will represent a set of vulnerabilities. Once found, these vulnerabilities are then exploited to steal data, distribute malicious content, or inject defacement and spam content into the vulnerable websites. Furthermore, a combination of different vulnerabilities is able to cause more damages than anticipation. Therefore, in this paper, we endeavor to find connections among various vulnerabilities such as cross-site scripting, local file inclusion, remote file inclusion, buffer overflow CSRF, etc. To do so, we develop a Finite State Machine (FSM) attacking model, which analyzes a set of vulnerabilities towards the road to finding connections. We demonstrate the efficacy of our model by applying it to the set of vulnerabilities found on two live websites.Comment: Accepted at 5th International Conference on Networking, Systems and Security (5th NSysS 2018

arXiv.org e-Print Archive

Crossref

bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents

Author: Abedin Jawaril Munshad
Ansary MD. Nazmuddoha
Farabe Md. Zami Al Zunaed
Haque Marsia
Hasan Beig Rajibul
Islam Shayekh Bin
Mobassir Syed
Sadeque Farig
Shawon Md. Mehedi Hasan
Shihab Istiak
Sushmit Asif
Zulkarnain Imam Mohammad
Publication venue
Publication date: 21/08/2023
Field of study

Despite the existence of numerous Optical Character Recognition (OCR) tools, the lack of comprehensive open-source systems hampers the progress of document digitization in various low-resource languages, including Bengali. Low-resource languages, especially those with an alphasyllabary writing system, suffer from the lack of large-scale datasets for various document OCR components such as word-level OCR, document layout extraction, and distortion correction; which are available as individual modules in high-resource languages. In this paper, we introduce Bengali

.

AI-BRACU-OCR (bbOCR): an open-source scalable document OCR system that can reconstruct Bengali documents into a structured searchable digitized format that leverages a novel Bengali text recognition model and two novel synthetic datasets. We present extensive component-level and system-level evaluation: both use a novel diversified evaluation dataset and comprehensive evaluation metrics. Our extensive evaluation suggests that our proposed solution is preferable over the current state-of-the-art Bengali OCR systems. The source codes and datasets are available here: https://bengaliai.github.io/bbocr

arXiv.org e-Print Archive

BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset

Author: Ahmed Intesur
Ansary Md. Nazmuddoha
Chowdhury Sayma Sultana
Dhruvo Shahriar Elahi
Dip Souhardya Saha
Emon Mahfuzur Rahman
Haque Md. Rezwanul
Hasan Md. Rakibul
Hossen Syed Mobassir
Humayun Ahmed Imtiaz
Meghla Marsia Haque
Pavel Akib Hasan
Rakib Fazle Rabbi
Reasat Tahsin
Sadeque Farig
Shihab Md. Istiak Hossain
Sushmit Asif Shahriyar
Publication venue
Publication date: 10/03/2023
Field of study

While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e.g., transcribing historical documents and newspapers. Moreover, rule-based DLA systems that are currently being employed in practice are not robust to domain variations and out-of-distribution layouts. To this end, we present the first multidomain large Bengali Document Layout Analysis Dataset: BaDLAD. This dataset contains 33,695 human annotated document samples from six domains - i) books and magazines, ii) public domain govt. documents, iii) liberation war documents, iv) newspapers, v) historical newspapers, and vi) property deeds, with 710K polygon annotations for four unit types: text-box, paragraph, image, and table. Through preliminary experiments benchmarking the performance of existing state-of-the-art deep learning architectures for English DLA, we demonstrate the efficacy of our dataset in training deep learning based Bengali document digitization models

arXiv.org e-Print Archive

High-Resolution Intertidal Topography from Sentinel-2 Multi-Spectral Imagery: Synergy between Remote Sensing and Numerical Modeling

Author: A.K.M. Saiful Islam
Allain
Fabien Durand
Fabrice Papa
Frazier
Laurent Testut
Marufa Ishaque
Mason
Md Jamal Uddin Khan
MD Nazmuddoha Ansary
Sciortino
Stéphane Calmant
Work
Yann Krien
Publication venue: 'MDPI AG'
Publication date
Field of study

Crossref

OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking

Author: Alam Samiul
Ansary Md. Nazmuddoha
Chowdhury Sayma Sultana
Dip Souhardya Saha
Hossen Syed Mobassir
Humayun Ahmed Imtiaz
Mamun Mamunur
Meghla Marsia Haque
Rakib Fazle Rabbi
Reasat Tahsin
Sadeque Farig
Shihab Md. Istiak Hossain
Sushmit Asif
Tasnim Nazia
Publication venue
Publication date: 15/05/2023
Field of study

We present OOD-Speech, the first out-of-distribution (OOD) benchmarking dataset for Bengali automatic speech recognition (ASR). Being one of the most spoken languages globally, Bengali portrays large diversity in dialects and prosodic features, which demands ASR frameworks to be robust towards distribution shifts. For example, islamic religious sermons in Bengali are delivered with a tonality that is significantly different from regular speech. Our training dataset is collected via massively online crowdsourcing campaigns which resulted in 1177.94 hours collected and curated from

22,645

native Bengali speakers from South Asia. Our test dataset comprises 23.03 hours of speech collected and manually annotated from 17 different sources, e.g., Bengali TV drama, Audiobook, Talk show, Online class, and Islamic sermons to name a few. OOD-Speech is jointly the largest publicly available speech dataset, as well as the first out-of-distribution ASR benchmarking dataset for Bengali

arXiv.org e-Print Archive